XML parsers are data processors underlying XML applications. A validating parser ensures that an XML document conforms to a DTD or XML schema. A nonvalidating parser simply checks that the document is well-formed - that it conforms to the basic XML 1.0 specification. A nonvalidating parser is suitable if the XML data in an application does not need to be verified against a DTD or other schema.
Xerces is an open-source parser from the Apache organization. Validating parsers are available for both Java and C++ implementing the W3C XML and DOM (Level 1 and 2) standards, as well as the de facto SAX (version 2) standard. Initial support for XML Schema (draft W3C standard) is also provided. A Perl wrapper is provided for the C++ version of Xerces, which allows access to a fully validating DOM XML parser from Perl. It also provides for full access to Unicode strings, since Unicode is a key part of the XML standard.
Microsoft's MSXML parser is Java-based, has full DOM support, and uses XDR, the XML Data (Reduced) Subset for Internet Explorer 5. The Microsoft XDR extensions allow you to access typed data and namespace information, create nodes by node type, and more. MSXML ships with Internet Explorer 5 and is also available as a standalone, redistributable COM component.
XP is a nonvalidating parser written in Java. Although it is nonvalidating, it can parse all external entities: external DTD subsets, external parameter entities and external general entities.
IBM's XML Parser for Java (XML4J) is a validating XML parser written in Java. The package contains classes and methods for parsing, generating, manipulating, and validating XML documents. XML4J supports the September 1999 W3C draft of XML Schema. Version 3.0.3EA3 of XML4J was released December 1999 and is based on the Apache Xerces XML Parser (which is largely based on an older version of XML4J). It adds DOM Level 2, SAX2 (alpha), and parts of the W3C Schema proposal.
IBM's XML for C++ parser (XML4C) integrates Apache's Xerces-C XML parser with IBM's International Components for Unicode (ICU) and extends the number of encodings supported to over 150. It consists of three shared libraries (2 code and 1 data) which provide classes for parsing, generating, manipulating, and validating XML documents. XML4C conforms to the XML 1.0 Recommendation and associated standards (DOM 1.0, SAX 1.0, DOM 2.0, etc). The 3.0.1 update adds support for multiply nested entities using relative paths or URLs as well as more DOM Level 2 features.
Project X has produced two parsers that integrate Java with XML. One is validating and the other non-validating. Java Project X is used as the default XML parser in Sun's JAXP (Java API for XML Parsing, released February 2000); however, the software's pluggable architecture allows any XML-conformant parser to be used, including the Apache's Xerces or other XML-compliant parsers. The JAXP 1.0 Release offers 100% conformance to the XML 1.0 specification, SAX 1.0, DOM Level 1 Core and XML namespaces.
XML Parser for Java supports the XML 1.0 specification, with the goal that it be 100% conformant, and can be used both as a validating or non-validating parser. It provides both DOM and SAX APIs for writing custom applications that process XML documents in the Oracle8i environment.
An event-driven, non-validating XML parser created by David Megginson, AElfred is a client-side XML parsing tool and can be downloaded as part of an applet. Microstar, the original distributor of AElfred, was acquired by Open Text Corporation in September 1999.
Lark and Larval are also Java-based parsers. Lark is a nonvalidating XML parser, Larval a validating parser built on the original Lark code base.
Copyright 2000 Extensibility, Inc.
Suite 250, 200 Franklin Street, Chapel Hill, North Carolina 27516